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DATA  COLLECTION,  REDUCTION, 
AND  ANALYSIS  PLAN 

by 

William  J.  Dunlay,  Jr, 


I.  INTRODUCTION 


Background 

This  is  the  second  report  by  the  writer  under  the  general  title:  Airport 
Improvement  Task  Force  Delay  Study.  The  first  was  the  Delay  Model  Validation 
Plan,  dated  August  18,  1977.  This  second  plan  was  prepared  under  supply  contract 
No.  W1 -77-2412-1  with  the  Office  of  Systems  Engineering  Management  of  the  U.  S. 
Federal  Aviation  Administration. 

The  validation  effort  to  which  this  plan  is  addressed  is  part  of  Phase  I 
of  contract  No.  D0T-FA77WA-3961  between  the  U.  S.  Federal  Aviation  Administra- 
tion and  Peat,  Marwick,  Mitchell  and  Company  (PMM  & Co.).  The  objective  of  the 
validation  is  to  test  whether  the  PMM  & Co.  delay  simulation  model  is  satisfac- 
tory (to  the  Technical  Officer)  for  its  intended  application  in  Phase  II  of  the 
contract,  namely  for  delay  estimation  in  support  of  six  Airport  Improvement  Task 
Forces. 

Purpose 

The  Delay  Model  Validation  Plan  presented  an  overall  outline  of  the  vali- 
dation procedure.  This  second  plan  describes  the  approach  taken  to  the  collec- 
tion of  data  for  the  validation  (both  for  model  input  and  comparisons  with  model 
output),  the  reduction  of  those  data,  and  the  statistical  comparisons  of  model 
estimates  and  corresponding  observed  quantities. 

/ 


2 


Scope 

This  plan  is  process-oriented,  that  Is  to  say,  it  does  not  present  re- 
sults of  data  collection,  reduction  or  analysis;  rather.  It  describes  the 
methodology  followed  In  these  three  processes. 

The  major  emphasis  of  this  plan  is  on  the  statistical  analysis  of  the 
data  and  the  hypothesis  testing  associated  with  comparing  model  estimates  with 
collected  data.  The  data  collection  and  reduction  steps  are  covered  in  less 
detail . 

The  remainder  of  this  report  is  organized  into  three  major  sections:  (1) 
data  collection,  (2)  data  reduction,  and  (3)  statistical  analysis  of  the  data. 

II.  DATA  COLLECTION 


Description  of  Sample 

Due  to  time  and  manpower  constraints,  only  ten  (10)  days  of  data  were  col- 
lected. The  first  of  these  was  for  training  the  data  takers.  Thus,  there  were 
9 days  of  useful  data. 

Data  were  collected  on  the  following  days: 

(1)  Monday,  20  June  1977  - training 

(2)  Tuesday,  21  June  1977  - Friday,  24  June  1977,  inclusive 

(3)  Wednesday,  27  June  1977  - Friday,  1 July  1977,  inclusive 

On  each  day,  data  were  collected  during  the  periods  8:00  a.m.  - 11:00  a.m. 
and  1:00  p.m.  - 4:00  p.m. , all  local  times,  i.e.,  CDT.  These  periods  correspond 
to  13:00  - 16:00  and  18:00  - 21:00  GMT,  respectively. 

Runway  Configurations 

Chicago's  O' Hare  International  Airport  has  a total  of  twelve  major  runway 
ends  (plus  a couple  of  minor  ones)  as  shown  In  Fig.  1.  The  six  major  runways 
are  arranged  in  three  sets  of  parallels:  (9R/27L,  9L/25R);  (4L/22R,  4R/22L); 
and  (14L/32R,  14R/32L).  Although  there  are  many  possible  runway-use  combina- 
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tions,  there  are  a number  of  preferred  ones,  including  (but  not  limited  to): 

(1)  Arrivals  on  14R,  22R,  22L 
Departures  on  27L,  22L 

(2)  Arrivals  on  27L,  27R,  321- 
Departures  on  32L,  32R 

(3)  Arrivals  on  9L,  9R,  4R 
Departures  on  4L,  4R 

(4)  Arrivals  on  14R,  14L,  9R  or  4R 
Departures  on  9L,  27L 

In  fact,  however,  runway-use  configurations  at  O' Hare  are  quite  variable. 
While  there  may  be  a dominant  use  pattern  during  a certain  period,  there  are 
nearly  always  a few  other  runway  uses  mixed  in  on  an  occasional  basis.  Thus 
it  was  difficult  to  anticipate  which  runway-use  configurations  would  be  in  ef- 
fect during  the  data  collection  period.  Instead,  the  surveyors  had  to  wait  and 
see  which  were  used  and  then  they  Identified  those  configurations  used  most 
frequently.  Runway-use  configurations  were  obtained  from  the  ATC  Tower  Logs  by 
the  time  of  day  they  were  in  service. 

The  airside  delay  simulation  model  being  tested,  for  all  practical  pur- 
poses, simulates  only  one  runway-use  configuration  at  a time.  This  emphasizes 
the  importance  of  obtaining  sufficient  data  for  at  least  a couple  of  stable  run- 
way-use configurations  for  the  validation  process. 

It  is  felt  that  three  hours  of  data  for  a given  (approximately  stable) 
runway  configuration  is  satisfactory  for  that  configurarion  to  be  useful  in  the 
subsequent  statistical  analyses  and  comparisons  with  model  estimates.  Compari- 
sons will  be  performed  for  all  such  reliable  configurations  identified  from  the 
data  collected. 


Data  Sources  and  Collection  Techniques 

There  were  ten  distinct  sources  of  data  utilized  in  the  data  collection 


effort: 

(1)  field  observation 

(2)  airline  data 

(3)  ATC  voice  tapes 

(4)  ATC  departure  strips 

(5)  ARTS- III  tapes 

(6)  ATC  tower  personnel 

(7)  existing  PMM  & Co.  data 

(8)  Official  Airline  Guide 

(9)  ASDE  films 

(10)  ADR  data 

Each  of  the  first  eight  sources  was  a primary  source  for  at  least  one  major 
piece  of  data;  the  last  two  served  as  back-up  data  for  cross  checking  and  for 
matching  together  ambiguous  data  items. 

Field  Observation.  The  following  variables  were  observed  directly  in  the 

field: 

(1)  aircraft  identifications  (arrivals  & departures) 

(2)  lift-off  times 

(3)  roll  times 

(4)  intersection  times  (arrivals  & departures) 

(5)  runway  exit  used  and  exit  times  (arrivals) 

(6)  departure-queue  length 

Several  data  collection  forms  were  developed  at  NAFEC.  Figure  2 shows  a 
two-part  form  designed  for  recording  flight  details  for  both  arrivals  and  de- 
partures. The  upper  half  of  the  form  was  used  to  record  the  aircraft  identi- 
fication, runway  used,  intersection  time  (If  any),  exit  time,  and  exit  used  by 
each  arrival.  The  bottom  half  of  the  form  is  for  recording  data  for  each  de- 
parture: identification,  runway  used,  roll  time.  Intersection  time  (if  any). 
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and  lift-off  time. 

Figure  3 is  designed  for  recording  data  on  departure  queue  lengths.  On 
this  form  the  times  that  departures  enter  and  leave  each  queue  were  recorded. 

A plus  (+)  sign  beside  the  aircraft  identification  indicates  an  aircraft  en- 
tering the  departure  queue;  a minus  (-)  sign  indicates  one  leaving,  i.e.,  taking 
off.  There  is  also  space  to  record  the  runway  number  and  queue  length.  The 
forms  of  Fig.  3 were  filled  out  in  the  FAA  air  traffic  control  tower. 

The  third  field  collection  form.  Fig.  4,  is  for  recording  penalty-box 
delays  and  runway  crossings.  These  were  observed  from  the  airline  ramp  control 
tower  at  O' Hare. 

It  was  not  possible  to  obtain  a 100%  sample  of  aircraft.  Close  to  a com- 
plete record  was  attempted  on  the  arrival /departure  form  of  Fig.  2.  The  data 
on  the  second  and  third  forms,  however,  were  recorded  for  relatively  small 
samples. 

For  field  data  collection  purposes,  the  airport  was  divided  roughly  into 
two  areas,  either  a so-called  N-S  division  or  an  E-W  division  depending  on  the 
runway  configuration  in  use.  These  correspond  roughly  to  the  areas  under  each 
local  controller  in  the  O'Hare  tower.  Two  persons  were  assigned  to  each  area, 
usually  one  for  arrivals  and  one  for  departures  or,  alternatively,  one  per 
runway  depending  on  the  traffic  situation. 

Equipment  for  field  data  collection  include  clip  boards,  pencils,  and 
blank  data  forms.  Aircraft  identifications  were  overheard  in  the  ATC  tower. 

A digital  clock  was  used  as  a source  of  time  data. 

Airline  Data.  The  following  data  were  obtained  directly  from  the  air- 
lines: 

arrivals:  (1)  taxi-in  times 

(2)  gate-arrival  times 
(1)  gate-push-back  times 


departures: 
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(2)  taxi-out  times 

Coded  forms  were  provided  by  the  airlines,  and  the  airline  data  effort 
was  coordinated  through  the  Air  Transport  Association. 

ATC  Voice  Tapes.  The  ATC  voice  tapes  served  as  the  primary  source  of 
data  on  initial  taxi  times,  l.e.,  times  when  permission  to  taxi  is  granted.  In 
addition,  they  served  as  a backup  source  for  runway  number  and  pushback  times 
for  departures. 

ATC  Departure  Strips.  These  were  the  primary  source  for  request-for-taxi 
times.  They  also  served  as  a cross  check  for,  and  a means  of  fillings  gaps  in, 
observed  data  on  departures. 

ARTS- III  Tapes.  Real-time  radar  recordings  (tapes)  are  avaliable  from 
the  Automated  Radar  Terminal  System  (ARTS- III)  and  the  National  Airspace  Sys- 
tem (NAS).  These  are  being  reduced  using  software  being  developed  at  the 
National  Aviation  Facilities  Experimental  Center.  Traffic  patterns  into  and 
out  of  O' Hare  are  being  reconstructed  from  the  recorded  trajectory  of  each  air- 
craft. 

The  ARTS- III  data  analysis  is  the  primary  source  for  the  following: 


for  each  arrival: 

(1) 

aircraft  class 

(2) 

outer  boundary  time 

(3) 

arrival  fix  Identification 

(4) 

arrival -fix  time 

(5) 

turn-on  time 

(6) 

threshold  time  (extrapolated) 

(7) 

runway  number 

for  each  departure: 

(1) 

aircraft  class 

(2) 

departure  fix  Identification 

(3) 

departure-fix  time 

In  addition,  the  ARTS- I II 

data  serves  as  the  basis  for  calculating  sched- 

uled  threshold  times  for  arrivals  using  nominal,  undelayed  flying  times  (from 
arrival  fix  to  threshold)  deduced  from  ARTS" III  trajectories  in  low-activity 
periods.  The  ARTS-I1I  data  Is  also  used  as  the  basis  for  deducing  various 
model  inputs  such  as  aircraft  approach-speed  distributions  and  distributions 
of  minimum  separations. 

ATC  Tower  Personnel.  From  discussions  with  ATC  tower  personnel,  and 
observations  made  In  the  tower,  the  following  data  items  were  defined  or  checked 

(1)  lengths  of  common  approach  path  for  each  runway,  each  aircraft  class, 
and  different  levels  of  activity 

(2)  locations  of  holding  stacks 

(3)  gate-hold  locations  and  procedures  (if  any) 

(4)  from  ATC  tower  logs,  data  on  runway-use  configurations  and  hourly 
traffic  counts  (by  arrival  vs.  departure  and  type  of  flight) 

(5)  locations  of  runway-crossing  problems 

(6)  taxiway  routes  and  two-way  paths 

(7)  departure  runway  reassignment  procedures. 

Existing  PMM  & Co.  Data.  The  following  quantities  were  defined  from 
existing  PMM  & Co.  data  for  O' Hare: 

(1)  runway  exit  utilization  - distributions 

(2)  standard  taxiway  speeds  by  location 

(3)  link  data 

(4)  runway-exit  distances 

The  above  data  were  checked  against  corresponding  data  from  other  sources. 

Official  Airline  Guide.  The  Official  Airline  Guide  (OAG)  was  the  primary 
source  of  scheduled  departure  times. 

ASDE  Films.  O' Hare  International  Airport  has  airport  surface  detection 
equipment  (ASDE)  radar.  Films  were  taken  of  the  ASDE  scope  during  the  data- 
collection  periods.  These  were  used  in  conjunction  with  the  voice  tapes  to  in- 


vestigate  departure  queue  behavior,  as  a primary  source  for  penalty-box  de- 
lays and  locations,  and  to  fill  In  gaps  In  the  observed  data.  Thus,  ASDE 
served  as  a secondary  data  source  for  cross-checking  and  matching  ambiguous 
observed  data. 

ADR  Data.  Data  from  the  Aircraft  Delay  Report  (ADR)  were  obtained  for 
the  da ta-col lection  period.  These  served  as  additional  backup  for  cross- 
checking data  from  other  sources  and  for  filling  gaps. 

III.  DATA  REDUCTION 

Most  of  the  data  reduction  activities  are  taking  place  at  the  National 
Aviation  Facilities  Experimental  Center  (NAFEC).  The  objective  of  the  data 
reduction  Is  to  obtain  reduced  data  In  a format  that  facilitates  the  computa- 
tion of  required  model  Inputs  and  level -of-service  measures  that  are  compara- 
ble with  model  outputs. 

Approximately  ten  computer  programs  are  being  or  have  been  developed  by 
personnel  at  NAFEC.*  These  programs  are  described  below: 

(1)  COMP  - compares  arrival  and  departure  records  and  reduces  them  to 
a format  similar  to  the  PMM  & Co.  model  output.  Outputs  from  this 
program  comprise  the  arrival  queue  data  set. 

(2)  CONVERT  - converts  data  on  arrivals  into  a queue  data  set,  i.e. , the 
time  joining  queue  and  time  leaving  queue  in  selected  time  intervals. 

(3)  QSUM  - computes  average  queue  length,  maximum  and  minimum  queue 
lengths,  and  time  in  queue  for  both  arrivals  and  departures,  and  by 
runway,  using  data  broken  down  by  5-mlnute  intervals  from  either 
the  model  or  the  field  data  collection. 

(4)  HISTO  - constructs  and  prints  histograms  of  taxl-ln  and  taxi-out 

♦These  programs  were  developed  by  Anthony  Bradley,  Robert  P.  Holladay,  and 
Jacques  Press  of  ANA-220. 
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times  for  the  airline  data  set, 

(5)  TAXI  - using  the  taxiway  route  structure  as  input,  this  program 
computes  expected  values  of  taxi-in  and  taxi-out  times  from  esti- 
mated probabilities  of  runway-exit  and  gate  utilization. 

(6)  ARTS  III  - a program  made  up  of  four  steps  which,  when  run  succes- 
sively, reduce  data  found  on  the  Automated  Radar  Terminal  System 
(ARTS  III)  extractor  tapes  collected  at  the  TRACON. 

(a)  The  first  step  converts  the  original  seven-track  tapes  to  nine- 
track  format. 

(b)  The  second  step  filters  out  and  collects  radar  track  messages 
for  arrival  and  departure  tracks  for  the  airport  being  mea- 
sured. Specific  portions  of  the  airspace  may  be  specified  as 
desired. 

(c)  The  third  step  constructs  a radar-track  history,  l.e. , the  tra- 
jectories flown  by  the  aircraft  across  selected  portions  (areas) 
of  the  airspace. 

(d)  The  fourth  step  analyzes  the  time  histories  created  in  step  3 
and  determines  crossing  times  at  key  points  in  the  airport/air- 
space system  including:  (1)  the  outer  ring  (about  45  miles 
out),  (2)  the  arrival  fix,  (3)  the  common-approach  point,  and 
(4)  the  runway  threshold.  Travel  times  are  also  computed. 

The  output  is  a record  summary  for  each  aircraft. 

(7)  MERGE  - reduces  and  synthesizes  data  from  many  sources  (e.g.,  field 
observation,  airlines,  ARTS  III,  OAG,  ADR,  PMM  & Co.,  ATC  tower) 
and  merges  them  together  into  a single  format  for  subsequent  proces- 
sing - see  Fig.  5.  Note  in  Pig,  5 that  the  data  are  merged  to  an 
80-column  format,  and  that  each  data  field  is  identified  by  an  arrow 
through  the  appropriate  columns.  Listed  below  each  arrow  is  the 
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source  of  that  particular  data  item, 

(8)  ACSCHED  - takes  the  merged  data  and  generates  an  aircraft  arrival 
and  departure  schedule  by  matching  arrivals  and  departures  according 
to  one  of  several  criteria  (in  descending,  hierarchical  order): 

(a)  by  aircraft  identity,  i.e.,  flight  number  or  call  sign 

(b)  by  assigned  gate  number,  i.e.,  same  gate  assigned  to  two  air- 
craft of  the  same  class  within  a short  time  interval 

(c)  by  operation  times  of  aircraft  of  same  class  and  airline 

(d)  by  operation  times. 

(9)  TRMSEP  - using  the  outputs  of  MERGE,  this  program  constructs  sta- 
tistical distributions  for  aircraft  separations,  in  seconds  be- 
tween successive  passings  of  the  runway  threshold,  Separations 
are  classified  by  (a)  runway,  (b)  by  dependent  runway  pair,  (c) 
by  aircraft -cl ass  pair,  and  (d)  by  type  of  operation,  i.e.,  ar- 
rival or  departure. 

(10)  TRVTIME  - calculates  means,  standard  deviations,  and  statistical 
distributions  for  aircraft  travel  times  between  the  arrival  fix 
and  runway  threshold.  Results  are  presented  by  "arrival-fix/run- 
way-used"  pair  and  by  aircraft  class. 

The  data  reduction  process,  as  described  in  the  foregoing,  will  provide 
the  necessary  "real-world"  data  for  input  to  the  simulation  model  and  response 
variables  for  comparison  with  model  outputs. 

IV.  STATISTICAL  ANALYSIS  OF  THE  DATA 

Purpose  and  Scope 

The  purpose  of  the  statistical  tests  of  model  output  versus  observed 
data  is  to  obtain  quantitative  evidence  of  the  model's  ability  to  simulate 
airfield  operations.  These  tests  should,  therefore,  be  viewed  as  "aids  to  de- 


cision  making"  rather  than  numerical  criteria  that  the  model  must  satisfy.  The 
significance  of  the  statistical  comparisons  described  below  can  only  be  judged 
in  the  context  of  the  anticipated  application  of  the  model  and  the  types  of  de- 
cisions that  the  model  might  support. 

The  description  that  follows  is  intended  to  guide  the  process  of  final 
data  reduction  and  calculations  of  "real-world"  comparison  quantities,  e.g., 
delays,  queue  sizes,  travel  times,  and  flow  rates.  It  is  also  intended  to  guide 
the  specification  of  the  output  of  corresponding  model  estimates  of  those  quan- 
tities. Finally,  this  section  describes  the  actual  steps  in  comparing  model 
estimates  with  measured  quantities  and  the  interpretation  of  those  comparisons. 
Model  Convergence 

The  internal  convergence  of  the  model  estimates  will  be  considered  as  a 
problem  separate  from  (but  not  unrelated  to)  the  validation  comparisons  with  ob- 
served data.  A simulation  model  might,  for  example,  produce  average  values  with 
extremely  high  precision,  as  measured  by  the  standard  error  of  the  mean,  but 
this  says  nothing  about  the  accuracy  of  those  average  values  in  an  absolute 
sense,  i.e.,  relative  to  corresponding  real-world  values. 

The  question  to  be  addressed  in  this  section  is  "How  many  independent  re- 
plications of  the  model  (each  with  a different  random  number  seed)  are  required 
to  obtain  a desired  degree  of  precision  in  model  estimates  of  mean  values  of 
delays,  flow  rates,  etc.?"  The  degree  of  precision  will  be  assumed  to  be  ex- 
pressed as  a confidence  interval.  A commonly  used  measure  is  the  95%  confidence 
interval,  which  has  a certain  intuitive  appeal. 

Suppose  that  we  run  the  model  n times,  each  with  the  same  input  data  but 
with  a different  random  number  seed.  From  each  run  suppose  we  obtain  an  esti- 
mate of  some  response  variable  for  the  simulated  period:  , 
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\_ , ....  Ln-1  The  point  estimate  of  the  overall  average  for  the  n replications 
is 


A 1 " 

L‘U  Li 

ni=l  1 


(1) 


and  the  sample  variance  is 


s2  = 


1 11  a.  ' 


(2) 


If  n is  sufficiently  large,  say  at  least  twelve,  then  the  sample  average,  L, 

1 n 

may  be  assumed  to  have  a normal  distribution  with  mean  - £ L.  and  variance 

ni=l  1 


n /n  o 

I (Li  - L)2 
i=l  1 
n - 1 


(3) 


Note  that  the  assumption  of  normality  is  supported  by  the  central  limit  theorem 
even  if  the  L..  are  not  normally  distributed.  Note  also  that  the  variance  is  not 
known,  a priori,  but  must  be  approximated  by  the  sample  variance. 

The  above  assumption  allows  us  to  obtain  a confidence  interval  estimate 
for  L as 

/\ 

l ± z s r 

a L 

if  n is  30  or  larger  or 

L ± t (n-1)  s,A 
a l 


The  L.  can  be  total  or  average  values  without  loss  of  generality.  Furthermore, 
the  f^ct  that  the  components  of  each  might  be  autocorrelated  can  be  ignored 
for  the  sake  of  this  discussion.  The  requirement  that  the  L-j,  i=l,  ....  n,  be 
independent  cannot  be  ignored. 
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for  n less  than  30,  where  Z is  from  the  standard  normal  tables,  1 - a is  the 

a 

confidence  level,  and  t (n-1)  is  from  a table  of  the  Student's  t distribution 

? 

with  n-1  degrees  of  freedom. 

The  95%  confidence  interval,  L ± Z s£  or  L ± t Q5  (n-1)  s£,  can  be 
specified  in  advance  as  either  an  absolute  value,  say  L ± A minutes  of  delay, 

A A 

or  as  a fraction  B of  the  mean  value,  e.g.,  L ± BL.  In  this  latter  case,  which 
is  a common  way  to  express  convergence,  it  must  be  realized  that  Lisa  random 
variable,  and  so  the  specified  confidence  interval  size,  BL,  is  a random  varia- 
ble and  not  a fixed  range.  Thus  the  usual  equations  for  such  intervals  are  only 
approximate  since  they  ignore  this  fact.  This  applies  to  Eqs.  (6),  (7),  and  (9) 
below.  In  either  case  one  can  solve  for  the  required  number  of  replications  to 
achieve  the  specified  precision  as  follows: 


n < 30:  n - n 


t^n-n^a,  - i)2  _ 


= 0 


for  absolute 
diff.  - A 


I 

l 


n > 30:  n - n - 


Mi  - [)2 


= 0 


A 

n 


for  diff.  as  a 
fraction  - B 


n < 30:  n - n o--r9-- 

B lc 


~~  c a 1-1  1 

n > 30:  n - n - 

L B^  L 


= 0 


= 0 


(4) 

(5) 

(6) 

(7) 


which  can  be  solved  for  n using  the  quadratic  formula. 

In  the  cases  where  n > 30,  the  Z value  is  usually  assumed  to  be  based  on 

a 

2 

a known  fixed  population  variance  a?  , even  though  we  estimate  it  with  s . 

Li 

_ 

Use  of  the  t-statistic  implies  the  additional  assumption  that  the  L^  are 
normally  distributed. 


Therefore,  Eqs.  (5)  and  (7)  are  usually  written: 


n 


(8) 


or 


Similarly,  even  when  n < 30,  it  is  usually  assumed  that  the  estimator,  s, 
is  not  a function  of  n.  Therefore,  Eqs.  (4)  and  (6)  are  usually  written: 


/V"-1’ 5 
~ V * 


n > 


(n-1)  s 
a'  ' 


B L 


(10) 


OD 


Note,  however,  that  in  Eqs.  (10)  and  (11),  t (n-1)  is  itself  a function 

a 

of  n.  Therefore,  n must  be  estimated  by  trial,  i.e.,  assume  a value,  say  n*, 
plug  in  tQ(n*-l)  and  solve  for  n and  check  to  see  if  n*  is  sufficiently  close 
to  the  n computed.  If  not,  repeat  until  n*  and  n are  sufficiently  close. 

How  rapidly  the  model  converges  for  any  given  n depends  on  how  many  air- 
craft are  processed  in  each  replication  which,  in  turn,  affects  the  total  num- 
ber of  variates  randomly  drawn  (for  a given  number  of  variates  per  aircraft) 
in  each  replication.  Thus,  it  is  not  possible,  and  in  fact  might  be  wasteful, 
to  make  any  blanket  statements  about  how  many  replications  are  necessary  to 
achieve  convergence.  Instead,  a relationship  should  be  developed  between  the 
number  of  replications,  the  number  of  aircraft  per  replication,  and  the  desired 
confidence  interval.  Such  a relation  could  be  depicted  graphically  as  in  Fig. 
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6 (which  is  schematic  only),  or  in  tabular  form  as  in  Table  1,  for  a given  re- 
sponse variable  expressed  in  units  of  minutes. 

The  convergence  of  the  model  probably  depends  on  the  variance  of  the  parti- 
cular quantity  being  considered.  For  example,  a greater  number  of  replications 
might  be  needed  for  average  arrival  delay  to  converge  to  its  criterion  than 
for  average  flow  rate  to  converge  to  its  criterion.  This  should  be  investigated. 
In  such  cases,  one  might  choose  the  maximum  of  the  various  derived  numbers  of 
replications  associated  with  the  different  comparison  quantities. 

There  is  an  obvious  tradeoff  between  run  length,  i.e.,  the  length  of  time 
period  being  simulated,  and  the  number  of  replications.  Clearly,  more  replica- 
tions will  be  required,  for  any  desired  degree  of  convergence,  if  one  hour  is 
simulated  than  for  three  hours,  other  things  being  equal,  e.g.,  the  level  of 
activity.  This  is  why  it  is  desirable  to  express  the  number  of  required  repli- 
cations as  a function  of  the  level  of  activity  and  the  length  of  time  interval 
being  simulated. 

Pseudo  Random  Number  Generator 

The  contractor  should  document  how  they  use  the  particular  pseudo  random 
number  generator  contained  in  the  model.  This  documentation  may  take  the  form 
of:  (1)  existing  descriptions  of  statistical  testing  of  the  generator  as  con- 
tained in  library  subroutine  descriptions  or  the  general  literature;  and  (2) 
accepted  methods  of  choosing  different  random  number  seeds  when  running  the 
model.  No  new  statistical  testing  is  anticipated  for  this  part  of  the  validation. 

The  contractor  has  provided  a copy  of  their  random  number  generator  rou- 
tine and  a series  of  random  number  streams  to  the  model  validation  working 
subgroup.  Preliminary  tests  of  the  provided  streams  indicate  that  they  are  sat- 
isfactory from  the  standpoint  of  serial  Independence  and  goodness-of-fit  to  a 
uniform  distribution  on  the  interval  (0,  1). 


Desired  95%  Confidence  Interval  - Min. 
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TABLE  1 MODEL  CONVERGENCE  AS  A FUNCTION  OF  RUN  SIZE  - 
TABLE  ENTRIES  ARE  NO.  OF  REQUIRED  REPLICATIONS  FOR 
A GIVEN  COMPARISON  VARIABLE 

No.  of  Aircraft  Processed  per  Replication 
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Model  Output  Responses  to  be  Tested 

As  described  in  the  validation  plan,  the  following  are  the  output  re- 
sponses to  be  compared  with  corresponding  observed  data  (arranged  in  order  of 
importance) : 

(1)  arrival  threshold  flow  rates  (by  time  interval) 

(2)  departure  threshold  (roll  point)  flow  rates  (by  time  interval) 

(3)  arrival  airspace  delay  (by  time  interval  and  aircraft) 

(4)  taxi-in  time  (by  time  interval  and  aircraft) 

(5)  taxi-out  time  (by  time  interval  and  aircraft) 

(6)  departure-queue  size  (by  time  interval) 

(7)  penalty-box  delay  (by  aircraft) 

These  quantities  are  computed  for  individual  five-minute  intervals  or 
individual  aircraft,  or  both,  as  indicated  above.  Consider,  for  example, 
arrival  delays.  The  model  should  output  estimates  of  arrival  delays  by  run- 
way, by  five-minute  time  interval,  and  by  individual  flight  number. 

In  testing  the  model  output  against  real-world  data,  we  may  use  time  in- 
tervals longer  then  five  minutes  and  groups  of  flights  larger  than  one.  By 
obtaining  five-minute  and  one-flight  delay  summaries,  however,  we  will  have 
sufficient  flexibility  in  choosing  interval  length  (multiples  of  5 min.)  and 
group  size.  We  will  thus  be  able  to  choose  an  optimal  (in  some  sense)  combi- 
nation of  interval  size,  which  affects  the  reliability  of  individual  observa- 
tions, and  the  number  of  intervals,  which  constitutes  the  sample  size  for  sub- 
sequent statistical  tests.  Similarly,  we  will  be  able  to  choose  efficient  com- 
binations of  flight-group  size  and  the  number  of  flight  groups. 

Table  2 summarizes  the  model  outputs  and  the  necessary  detailed  specifi- 
cations for  the  model  estimates. 

Data  Reduction  of  Output  Responses 


The  reduction  of  the  observed  data  should  result  in  real-world  output 
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responses  identical  in  definition  to  the  model  output  specifications  of  Table 
2,  hence  the  title  "Model  ard  Real-World  Output  Response." 

From  the  comments  column  of  Table  2 it  is  clear  that  it  is  important  to 
tally  delays  by  time  interval  (more  specifically,  the  time  interval  in  which 
they  are  finally  calculated)  in  the  same  way  in  the  model  as  in  the  observed 
data  reduction  and  delay  calculations.  This  could  present  certain  problems 
that  can  be  avoided  by  testing  delays  tallied  by  individual  flight  number.  In 
the  latter  case,  delays  /?or  groups  of  individual  flights  can  be  averaged  to- 
gether. Hence,  the  two  series  to  be  compared  could  be  denoted  X^,  i=l,  ...,  m 
and  Y . , i=l , ...,  m,  where  i represents  the  individual  flight  group,  Xi  are 
model  estimates,  and  Y.  are  calculated  from  the  collected  data.  The  above  two 
series  are  then  treated  as  time  series. 

It  is  expected  that  arrival  delays,  ground  travel  times,  and  penalty-box 
delays  will  be  treated  on  an  individual-flight  basis  in  addition  to  the  time- 
interval  basis.  Which  is  the  best  approach  will  have  to  be  judged  after  at- 
tempting the  tests  and  seeing  what  problems  are  encountered  in  each  one. 

Nature  of  the  Statistical  Hypothesis  Tests 

The  Hsu-Hunter  method  of  time-series  analysis  will  be  used  in  the  hypothe- 

3 

sis  testing  of  the  model  estimates  against  observed  data.  Computer  programs 
will  be  available  to  aid  in  conducting  the  statistical  comparisons.  To  facili- 
tate the  process  the  required  comparison  data  of  Table  2 should  be  punched  on 
cards  in  the  reduction  of  the  model's  detailed  (individual  aircraft)  output  tape. 

Unfortunately,  it  is  not  feasible  to  obtain  data  on  a large  number  of 
days  that  constitute  independent  and  identical  conditions.  Because  conditions 
and  runway  configurations  are  so  variable  at  O' Hare,  we  are  probably  constrained 
to  compare  observed  data  for  individual  days  to  model  outputs  for  those  days. 


3 

Hsu,  D.  A.  and  J.  S.  Hunter,  "Analysis  of  Simulation-Generated  Responses  using 
Autoregressive  Models,"  Management  Science,  Vol.  24,  No.  2,  October  1977,  pp. 
181-190. 
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Model  outputs  are  averaged  over,  say,  n different  replications  to  obtain  the  de- 
sired model  convergence  as  discussed  earlier.  The  sample  size  for  the  model 
estimates  is,  therefore,  n-fold  larger  than  the  sample  of  observed  data.  The 
two  samples  are  depicted  in  Table  3. 

The  first  step  of  the  Hsu-Hunter  method  is  to  obtain  an  autoregressive 
time  series  model  of  order  p for  the  observed  data  as  follows: 


(12) 


m 


where  yt  * Yt  - 


Y = M 

m u 


"US 


i=l 


V 


so  that  E{yt>  = 0 for  all  t. 


<^*,  i=l , ...,  p = autoregressive  coefficients  for  the  observed  data, 

= normally  distributed  random  error  term  with  mean  0 and  variance  o22. 
Before  fitting  the  autoregressive  model,  one  can  gain  insight  into  a time 
series  by  estimating  its  autocorrelation  function.  Consider  a time  series  Yt 
that  has  the  properties  of  a covariance-stationary  process:  namely  that  neither 
the  covariance  nor  the  expected  value  of  the  time  series  is  a function  of  time. 
The  autocovariance  function  of  is  defined  as 

Cov  <v  v$>  - E - E<V]  [Vfs  - E(W]) 


The  assumption  of  stationarity  implies  that  Cov  (Y^,  Yt+$}  depends  only  on  s and 
not  on  t.  The  autocovariance  function  can  be  estimated  using  the  following  es- 
timator for  the  "sample  covariance  of  lag  s": 

«.  ‘ £ £ <rt  - h <vt+s  - i) 

Note  that  Cq  is  the  sample  variance  of  Yt  based  on  a divisor  n instead  of  the 
usual  divisor  n-1.  The  autocorrelation  function,  p$,  obtained  from 
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TABLE  3 - SAMPLES  OF  MODEL  ESTIMATES 
AND  OBSERVED  DATA  FOR  INDIVIDUAL  DAYS 
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is  a measure  of  the  linear  dependence  of  on  its  past  history.  The  function 
ps  is  equal  to  unity  for  s = 0 and  lies  in  the  interval  (-1,  1)  for  all  other 
values  of  s. 

The  autocorrelation  function  is  used  in  this  validation  to  gain  insight 
into  the  two  time  series  being  compared  and  to  help  decide  on  the  order,  p, 
of  the  autoregressive  time  series  models  described  below.  We  will  also  compare 
the  autocorrelation  functions  for  the  model  estimates  to  the  ones  for  observed 
data  by  plotting  each  one  as  a function  of  s. 

For  the  estimates  output  by  the  simulation  model  we  have  the  autoregres- 
sive time  series  model  in  Table  4 in  which: 

xt(j)  = Xt(j)  - X«>  and  iM'* 

TABLE  4 - AUTOREGRESSIVE  TIME  SERIES  MODEL  FOR  THE  MODEL  ESTIMATES 
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Pooled  l X (j)  = ♦1  l x^j  + *2  l x[^  +•••+♦!  4J)  + l «t(j) 

,i=l  * 'j-1  t 1 2j=l  W ■ Pj=l  * P j=l  1 

where  = normally  distributed  random  error  term  with  mean  0 and  variance  oj2 

n ( i ) 

and  l a.u'  = normally  distributed  random  error  term  with  mean  0 and  variance  n o,2. 
j=l  1 


We  can  pool  together  the  results  of  the  n replications,  as  shown  on  the 
previous  page  by  the  last  line  of  Table  4,  without  loss  of  generality  except 


29 


n ( i \ 

that  the  variance  of  7 a.'j;  would  not  be  comparable  with  the  variance  of  6f: 

3=1  1 

if  ct  ^ has  variance  aj2,  then  l a.^  has  variance  n oj2.  Because  we  know 
1 3=1  1 

that  Var  j — ? = — Var  {Z> , we  can  divide  the  last  line  through  by  /n  as  follows: 

U ; n 


J,4J>  . lA-l  . 14-2 
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for  which  the  varaince  of  the  error  term. 


l «t 
til 

/n 


(j) 


, is  oj2,  which  is  now  compara- 


ble to  the  variance  of  the  observed  data  error  term,  a22. 

The  foregoing  arguments  can  be  summed  up  by  saying  that  the  results  of  the 
n replications  of  the  model  should  be  added  together  and  divided  by  the  VrT  before 
Derforming  the  autoregression.  Thus,  the  two  autoregressive  time  series  models 
to  be  compared  in  step  1 of  the  Hsu-Hunter  method  are  given  by  Eqs.  (12)  and  (13). 

The  model  series  has  parameters  4>j  = (<j>^,  <t>2 » •••»  4^)  and  °i2  while  the  ob- 
served series  has  4>2  = U-j*.  •••»  <t>  *)  and  °22-  The  parameters  of  the  in- 

ferential  statistic  GU,  y)  are  - <j> x - <j>2  and  y = . In  the  following  dis- 

°i2 

cussion  p is  assumed  to  be  strictly  less  than  m (in  previous  applications  p has 

usually  been  2 or  3 depending  on  the  nature  of  the  autocorrelation  function  - 

more  will  be  said  about  this  later). 

Hsu  has  shown  that  the  joint  probability  density  function  of  ^ and  y, 

12  4 

G(4»,  y),  is  asymptotically  distributed  as  ^ x (p  + !)• 


M 1 O 

^Details  of  the  derivation  of  1 x IP  + I)  asymptotic  distribution  are  available 
in  Hsu,  D.  A.,  Stochastic  Instability  in  the  Behavior  of  Stock  Prices,  unpub- 
lished Ph.D.  dissertation,  Department  of  Statistics,  University  of  Wisconsin, 
Madison,  May  1973. 


The  inferential  statistic  G(0,  1)  is  used  to  test  the  hypothesis  that 

= 0 and  y = 1.  i.e.,  to  test,  simultaneously,  the  potential  difference  in  both 

the  autoregressive  parameters  and  the  residual  variance  between  the  two  time 

series.  Hsu  and  Hunter  recommend,  based  on  experience  with  Monte  Carlo  experi- 
2 

ments,  that  the  x approximation  is  satisfactory  when  the  length  of  both  time 

5 

series,  m,  is  no  less  than  60. 

The  second  step  of  the  Hsu-Hunter  method  is  to  compare  the  means  of  the 
two  time  series  under  consideration.  In  this  second  step  an  inferential  statis- 
tic, distributed  as  a Student's  t random  variable,  is  used  assuming  that  the 
values  of  the  autoregressive  parameters  (from  step  1)  are  known  quantities. 

For  comparison  of  means  we  have  the  two  autoregressive  time  series  models 
of  order  p: 


n 

I > 

j=l 


,(j)  „ 


. *«}  + 
>1  C 1 


b ? 

2j=l  V2 


+ <j> 


n 


(j) 

t-p 


+ 


l 

j=l 


and 


>1*  yt_-j  + <f>2*  ^t-2  + ■”  + ^p*  yt-p  + at 


n m 


The  sample  means  of  the  two  original  time  series  are  X = Y t + m) 

j=l  1=1  1 / 


and  Y = - \ Y..  Denote  the  two  population  means  as  u,  and  y?,  respectively, 

m i i ^ 


m 


Then  the  two  above  series  can  be  rewritten: 


n ( i ) 

where  at  = [ atJ’ 
and 


Hsu  and  Hunter,  p.  185. 
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vt  - “2  - *r(vt-l  - >-2)  + *2V(Yt-2  - "2)  + + *p*(vt-p  - v + 6t 

for  t = p + 1 , 

We  will  now  obtain  a transformed  variable  by  shifting  all  terms  of  the  above 

n ( i ) 

equations  that  involve  u,  or  to  the  right  side  and  the  £ Xu;,s  and  Y's  to  the 

j=l 

left  as  follows: 


ut 5 jj,  4j>  - ♦iji, w x"-p  ■ (1  - *1  - - 

(the  symbol  "="  means  "by  the  definition"  or  "is  defined  as") 
and 


,(j) 


,(j)  = 


»p}  U1  + dt 


= (1  - 


Dividing  both  sides  of  the  above  equations  by  the  coefficients  of 
and  y2,  i.e. , by  (1  - $p)  and  (1  - $■]*.-  - <f>p*),  respective- 

ly, the  desired  transformed  variables  result: 


wt  H 1 - 


= w 


1 1 


= V1  + ct 


(12) 


and 


= u2  + 


(13) 


where  ct  is  a normally  distributed  residual  with  mean  0 and  variance 

2 

ax2  /(I  - - •••  - $ ) and  b.  is  normally  distributed  with  mean  0 and 

I P w 

2 

variance  o22  / (1  - $1*  - •••  - <j>p*)  . 

The  transformed  variables  wt  and  w^*  are  independent  normal  variables 

2 

with  mean  u-j  and  p2  and  variances  oi2  /(I  - 4>-j  - •••  - <J>p)  and  a22  /(I  - <t>^*  - 
2 

- <j>p*)  , respectively. 

Thus,  we  now  have  two  transformed  series  w = {wp+^ , ....  wm)  and 
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w*  = ^wp+]  > •••>  w*}  which  have  means  u-j  and  We  can  express  the  distribu- 
tion form  of  the  inferential  statistic  for  (ug_  - u-j)  as  follows 


t 


W2  ' W1 

*1! 

si2  + S^_ 

*5 

| 

m-p  m-p 

t (g2) 


(14) 


This  statistic  is  distributed  approximately  as  a Student's  t random  variable  with 
g9  degrees  of  freedom.  The  parameters  st2  and  s22  are  the  sample  variances  of  w 
and  w*,  respectively,  and  g-j  and  are  both  functions  of  s^,  s22,  and  m.G 


The  w and  w2  are  the  means  of  the  transformed  variables  w and  w*.  Thus  we  can 
use  the  above  t-statistic  to  test  the  hypothesis  HQ:  u-j  = p2-  By  using  the  trans 
formed  variables  w^.  and  w^*,  both  being  serially  independent,  the  condition  that 
w1  and  w2  be  normally  distributed  is  satisfied.  Furthermore,  we  have 

taken  account  of  and  have  removed  the  autocorrelation  of  the  two  series 

2 

by  dividing  their  variances  cfj2  and  a22  by  (1  - - • • • - <|>  ) and 

2 

(1  - <t>1*  - - <f>  *)  , respectively. 

Even  more  so  than  most  traditional  methods  of  statistical  analysis,  it 
is  essential  that  an  experienced  statistician  be  involved  in  the  various  steps 
of  autoregressive  time  series  analysis.  This  type  of  analysis  involves,  in 
addition  to  the  usual  estimation  phase,  a model  identification  phase.  This 
phase  provides  considerable  flexibility  to  the  trained  statistician  in  fitting 
an  autoregressive  model  to  a time  series.  The  autocorrelation  function,  for 
example,  provides  a clue  to  choosing  the  order  p of  the  autoregressive  model. 
Furthermore,  an  analysis  of  the  residuals,  more  precisely,  the  autocorrelation 
function  of  the  residuals,  of  an  autoregressive  model  can  point  to  appropriate 
modifications  of  the  model.  For  example,  such  an  analysis  of  residuals  may 

6 Details'  of  the  derivation  and  computation  of  the  statistic  t above  are  availa- 
ble in  Box,  G.E.P.  and  Tiao,  G.C.,  Bayesian  Influence  in  Statistical  Analysis, 
Addison  Wesley  Publishing  Co.,  Chapter  2,  pp.  107. 
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point  to  an  improved  identification  of  the  model. 

Given  the  foregoing  flexibility  in  the  fitting  of  an  autoregressive  time 
series  model,  it  is  expected  that  there  will  be  no  major  difficulties  in  fitting 
such  a model  to  the  observed  or  model -generated  time  series  of  this  validation 
effort.  For  the  unlikely  event  that  model-fit  problems  do  arise,  however,  a 
simplified  alternative  approach  is  presented  below. 

Alternative  Methods  of  Statistical  Analysis 

In  the  unlikely  event  that  difficulties  are  encountered  in  attempting  to 
fit  an  autoregressive  time  series  model,  we  will  have  to  fall  back  on  more 
traditional  statistical  hypothesis  tests  that  assume  that  a true  random  sample 
can  be  achieved  and,  consequently,  that  the  items  of  that  random  sample  are  mu- 
tually independent;  this  implies  that  they  would  be  uncorrelated. 

In  such  cases,  for  example,  a standard  Student' s-t  test  can  be  used  to 
test  the  difference  in  the  means  of  two  random  samples  that,  say,  correspond 
to  the  model  outputs  and  observed  data,  respectively.  Details  of  such  a test 
are  given  here  as  an  alternative  to  the  Hsu-Hunter  method.* 

Described  herein  is  a standard  statistical  test  that  can  be  used  to  test 
whether  a set  of  delay  estimates  produced  for  a specific  set  of  circumstances 
by  a simulation  model  and  a real  world  measurement  taken  under  the  same  circum- 
stances have  the  same  mean  value. 

These  delay  estimates  are  considered  to  be  averages  taken  over  some  time 
period.  Care  must  be  taken  to  insure  that  the  time  period  used  be  sufficiently 
long  that  the  model  and  the  real  world  might  reasonably  be  expected  to  show 
approximately  the  same  behavior.  The  test  described  below  can  be  used  for 
average  delays  taken  over  any  sufficiently  long  time  period. 

The  assumptions  required  are  that  the  estimates  produced  by  the  simulation 

*This  t-test  description  was  provided  by  N.  J.  Kirkendall  of  The  MITRE  Corpora- 
tion, memo  to  A.  L.  Haines,  dated  5 January  1978. 


model  be  independent  and  identically  distributed  N(u,  o2)  where  p and  o2  are 
fixed  but  unknown,  and  that  the  real  world  observation  be  independent  of  the 
model  estimates  and  normally  distributed  with  the  same  variance,  a2. 

Since  the  test  described  below  is  fairly  robust  to  departures  from  nor- 
mality,** the  only  really  questionable  assumption  is  the  equality  of  the  vari- 
ances of  the  processes  underlying  the  model  estimates  and  the  real  world  mea- 
surement. If  the  simulation  model  were  perfect,  of  course,  the  variances  as 
well  as  the  means  would  be  equal.  In  any  case,  a single  observation  from  the 
real  world  will  provide  no  information  concerning  the  variance  of  the  average 
delays  observed  in  the  real  world. 

Details.  Let  x0  represent  the  real  world  measurement  for  a given  time 
period,  and  x-j , ....  x^  represent  the  N estimates  from  the  model  for  that  time 
period.  Assuming  that  the  xi  are  mutually  independent  and  normally  distributed 
with  the  same  variance  we  would  like  to  test  the  following  hypothesis: 

H°:  Xo xn~N(p,  a2) 

Hr  x°  N(yr  °2)’  V •••’  xN~N(y’  °2)’  y * yl 

The  test  of  this  hypothesis  is  the  standard  likelihood  ratio  test  for  the 
equality  of  the  means  of  two  samples,  as  given,  for  example,  on  page  288  of 
Hogg  & Craig,  Introduction  to  Mathematical  Statistics,  2nd  ed.,  1966. 

The  result  is  that  the  test  with  significance  level  a is  to  reject  H0  when 


where 


**Kendall  & Stuart,  The  Advanced  Theory  of  Statistics,  Vol . II,  2nd  ed. , 1967, 
pp.  465-467. 
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N 


i=l 


and  ^ (a)  is  chosen  from  tables  of  the  Student's  t distribution  with  N-l  degrees 
of  freedom  to  have 

P^|t|  > tN-1  (c)^  = a 

This  implies  that  confidence  limits  for  x0  are  given  by 


Details  of  the  derivation  of  this  likelihood  ratio  test  can  be  found  in 
most  textbooks  on  mathematical  statistics. 


It  should  be  emphasized  that  this  method  considers  a single  model  average 
for  each  replication  for  a given  time  period  versus  a single  observed  value 
rather  than  a time  history,  say  by  five  minute  interval,  of  model  estimates  for 
each  replication  versus  a time  history  of  observed  data  as  described  earlier. 
Thus,  this  alternative  method  does  not  provide  as  much  information  as  the  time 
series  comparisons  of  the  Hsu-Hunter  Method.  Nevertheless,  it  is  presented  here 
as  a possible  alternative  in  case  there  are  difficulties  encountered  in  at- 
tempting to  apply  the  Hsu-Hunter  method. 


It  is  recommended  that  the  results  of  the  tests  not  be  judged  on  the  basis 
of  a priori  significance  levels,  e.g.,  the  0.05  level.  Instead,  significance 
probabilities  should  be  estimated  for  each  test  and  the  entire  set  of  signifi- 
cance probabilities  then  judged  by  the  Model  Validation  Group  as  described  be- 

•• 

■ 
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low.  Technical  aavisors  on  the* validation  working  subgroup  will  provide  as- 
sistance in  interpreting  the  results  of  the  tests. 

Another  important  consideration  in  judging  the  statistical  test  results 
is  that  not  all  response  variables  are  equally  important.  Some  are  more  im- 
portant than  others  in  the  types  of  decisions  made  by  airport  operators,  air- 
lines, and  FAA.  This  will  be  taken  into  account  in  the  evaluation  of  results 
by  the  Model  Validation  Group. 

A significance  probability  (usually  denoted  as  Pj)  is  the  probability  of 

2 

obtaining  a value  of  the  inferential  statistic,  i.e.,  a x -value  or  t-value, 
as  large  as  the  one  computed  in  the  test,  given  that  the  hypothesis  tested  is 
actually  true.7  If  this  probability  is  very  small,  one  would  tend  to  reject 
or  at  least  suspect  the  hypothesis.  If  the  significance  probability  is  not 
small,  then  one  would  tend  not  to  reject  the  hypothesis.  The  conclusion  in 
the  latter  case  might  be  that  the  differences  obtained  in  the  test  are  due  to 
chance  rather  than  to  defects  in  the  model.  No  precise  universal  definition 
of  "small"  can  be  offered  - this  is  a matter  of  judgement  and  confidence  in  the 
statistical  methods  employed. 

Further  insight  into  the  test  results  may  be  obtained  by  considering  the 
results  of  the  whole  set  of  tests.  Suppose  for  example  that  test  results  in 
the  form  of  significance  probabilities,  P..,  where  i is  the  test  number  and  j 

■ J 

is  the  comparison  variable,  are  tabulated  as  shown  on  the  following  page. 

If  the  hypotheses  tested  were  all  really  true,  then  at  any  arbitrary  sig- 
nificance level,  a,  one  would  expect  the  number  of  tests  that  failed,  i.e.,  the 

number  of  P..  < a,  to  not  exceed  (100  a)  percent  of  the  total  number  of  tests, 
i J 

kt.  Thus  not  more  than  about  5%  of  the  should  be  less  than  0.05,  not  more 
than  about  10%  should  be  less  than  0.10,  etc.,  by  definition  of  significance 

^Recall  that  the  general  nature  of  the  hypotheses  tested  in  this  validation  is 
that  there  is  no  difference  between  the  model  estimates  and  the  observed  data. 
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Comparison  Variable 
1 2 j ft 


.h 


Test  No. 


k 


O 

probability. 

The  foregoing  percentages  are  not  suggested  as  hard-and-fast  criteria  for 
acceptance  of  the  model,  because  the  corresponding  tests  are  not  all  identical; 
besides,  the  number  of  tests  will  probably  not  be  large.  The  above  ideas  do, 
however,  provide  one  means  of  roughly  interpreting  the  results  of  the  whole 
set  of  tests  to  be  performed  in  this  validation. 

V.  SUMMARY  DESCRIPTION  OF  STEPS  IN  STATISTICAL  ANALYSIS  OF  DATA 


cal 

I. 


Outlined  below  are  the  steps  to  be  followed  to  accomplish  the  statisti- 
analysis  suggested  in  the  foregoing  sections: 

Model  Convergence 
A.  Prerequisites: 

The  model  convergence  as  a function  of  the  number  of  aircraft 
processed  per  replication  and  the  number  of  replications  can  be  done 
as  part  of  the  sensitivity  analysis  using,  perhaps,  the  LaGuardia 
data  set.  One  caveat,  however,  is  that  the  results  should  be  spot 


8 


If  k was 
column: 


very 

V 


large,  one  could  check  this,  more  appropriately,  within  each 


kj’ 


j-1. 


ft.  This  will  probably  not  be  the  case,  however. 
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checked  later  to  see  if  they  hold  for  the  O’Hare  data  set.  Thus  the 
only  prerequisite  is  that  the  model  be  running  for  the  LaGuardia  in- 
put data.  We  must  also  know,  however,  the  number  of  aircraft  pro- 
cessed in  each  replication,  which,  in  turn,  affects  the  number  of 
random  variates  drawn  each  time. 

B.  Procedure: 

The  model  should  be  run  for  a large  number  of  replications,  say 

Q 

at  least  30.  Cumulative  averages,  Eq.  (1),  should  be  computed  for 
each  response  variable  at  every  5 replications.  Also,  compute  the 
cumulative  sample  variance,  Eq.  (2),  after  every  5 replications. 

This  will  yield  the  results,  illustrated  in  the  table  below,  for 
each  response  variable  and  for  each  activity  level,  i.e.,  for  each 
number  of  aircraft  processed. 

One  or  the  other  of  the  last  two  columns  of  the  following  table 
contains  the  95%  confidence  interval  for  a given  number  of  replications. 
These  can  then  be  compared  to  an  a priori  confidence  interval,  and  an 


n = No.  of 

Sample 
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for  n < 

30 

for  n 2 30 
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The  term  "cumulative"  here  means  for  all  previous  replications  at  each  stage 
and  not  just  for  the  groups  of  5. 
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appropriate  number  of  replications  can,  thereby,  be  selected.  To 
aid  this  process,  a graph  similar  to  Fig.  6 can  be  prepared. 

Note  that  the  curves  of  Fig.  6 are  essentially  contours  of 
equal -confidence-interval  values.  These  can  be  plotted  as  follows: 

(1)  Create  a grid  made  up  of  horizontal  lines  at  5-replica- 
tion intervals  and  vertical  lines  corresponding  to  the  different 
activity  levels,  say  A,  B,  C,  etc.  - see  Fig.  7. 

(2)  The  computed  confidence  intervals  for  each  combination  of 
number  of  replications  and  number  of  aircraft  processed  should  be  re- 
corded at  the  corresponding  intersections  on  the  grid  - see  the 
C-values  of  Fig.  7. 

(3)  Interpolate  contours  on  the  grid  for  convenient,  say 
rounded  interger- valued  confidence  intervals  - see,  for  example,  the 
5,  10  and  15-min.  contours  of  Fig.  6. 

This  may  seem  a tedious  process,  but  the  result  is  valuable: 
namely,  an  approximate  guide  for  choosing  the  number  of  runs  for 
using  the  model  at  any  airport  given  specifications  on  convergence, 
the  approximate  activity  level  under  consideration,  say  in  aircraft 
per  hour,  and  the  length  of  time  period  simulated  in  hours;  note 
that  the  abscissa  of  Fig.  7 is  the  product  of  these  latter  two 
quantities. 

The  convergence  results,  as  expressed  in  Figs.  6 and  7,  would 
be  translated  into  a simplified  set  of  guidelines  for  using  the 
model  in  terms  of  the  approximate  number  of  replications  desirable 
for  different  ranges  of  activity  levels. 

One  question  that  needs  to  be  addressed  is  whether  conclusions 


about  convergence  at  one  airport  are  transferable  to  other  airports 
where  there  may  be  a different  pattern  of  operations  and,  hence. 


No.  of  Replications 
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No.  of  Aircraft  Processed  per  Replication 


Fig.  7 Illustration  of  Plotting 
Contours  of  Equal  Confidence  Interval 


different  types  of  aricraft  interactions.  The  answer  to  this  ques- 
tion is  not,  a priori,  obvious.  This  should  be  investigated  by 
comparing  the  LaGuardia  convergence  results  with  O' Hare  results. 
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II.  Statistical  Analysis 

A.  Prerequisites: 

1.  Selection  of  final  set  of  output  responses  to  be  valida- 
ted - see  Table  2. 

2.  Set  up  Hsu-Hunter  computer  programs  so  that  they  run  on 
the  NAFEC  computer.  Have  Dr.  Hsu  instruct  NAFEC  person- 
nel on  using  the  programs.  The  computer  program  used  to 
execute  the  Hsu-Hunter  method  produces  computed  results  of: 

(1)  the  value  of  G(0,  1)  and  its  significance  level  de- 

2 

termined  based  upon  an  appropriate  x distribution;  (2) 
the  values  of  G(i/j,  1)  and  G(0,  y),  and  their  significance 
levels  compared  with  appropriate  x variables;  (3)  the  t- 
value  for  testing  the  difference  in  means  and  its  signi- 
ficance level;  and  (4)  supplementary  information  including 
estimates  of  u,  <f>  and  a2  for  both  observed  and  generated 
series.  The  preliminary  stage  of  model  identification  using 
Box-Jenkins  techniques,  however,  requires  a separate  sub- 
routine package  that  is  operational  at  Princeton  University. 
An  attempt  will  be  made  toi.get  this  preliminary  subroutine 
running  at  NAFEC. 

3.  Prepare  computer  programs  for  reading  required  data  from 
the  contractor's  simulation  model,  i.e.,  the  individual 
aircraft  output.  The  comparison  variables  of  Table  2 
are  to  be  defined  from  that  tape. 
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B.  Procedure 

1.  Prepare  graphical  and  tabular  summaries  of  the  model  out- 
put and  the  corresponding  real-world  quantities.  Perform 
visual  comparisons  of  the  two  for  each  response  variable. 
This  will  consist  mainly  of  looking  at  the  summary  output 
of  the  model  and  corresponding  summaries  of  observed  data. 

2.  Choose  appropriate  size  time  intervals  (or  aircraft  group 
sizes)  for  the  time-series  analysis.  Keep  in  mind  that 
the  number  of  intervals  (groups)  is  the  sample  size  for 
subsequent  analysis.  This  is  desirably  as  large  as  possi- 
ble except  that  another  important  consideration  is  the 
number  of  events  (aircraft)  in  each  interval  (group). 

Thus  there  is  a tradeoff  here.  It  may  not  be  reasonable 
to  expect  the  model  to  accurately  duplicate  the  real  world 
on  an  aircraft-by-aircraft  basis  or  minute-to-minute  basis, 
given  the  way  random  variates  are  generated  in  the  model. 

It  is  appropriate,  however,  to  expect  the  model  to  provide 
reasonable  estimates  for  larger  time  intervals,  say  five 

to  ten  minutes,  or  for  larger  aircraft  group  sizes,  say  five 
to  ten  aircraft.  The  problem  here  is  to  choose  an  optimal 
combination  (or  at  least  a good  one)  of  sample  size  and  the 
number  of  items  In  each  sample  element.  Thus,  a certain 
amount  of  data  evaluation  and  experimental  design  is  re- 
quired before  the  actual  time  series  analysis  can  proceed. 
Note  that  this  second  task  must  be  performed  for  each  out- 
put response  (comparison)  variable  of  Table  2,  for  both  the 
model  estimates  and  the  observed  data.  Of  course,  any 
given  comparison  variable  will  have  a common  sample  size 
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for  model  and  observed  values.  Once  the  two  time  series 
are  defined  for  each  comparison  variable,  the  remaining 
steps,  described  below,  can  proceed  on  each  one. 

3.  Compute  autocorrelation  function,  p , for  the  model  series 

and  the  observed  data  sereis  (see  p.  28).  Based  on  the 

form  of  the  two  autocorrelation  functions,  say  as  deter- 
mined from  a graphical  plot  of  each  one  vs.  s,  choose  the 
order,  p,  for  the  autoregressive  time-series  model. 

4.  Perform  step  one  of  the  Hsu-Hunter  method  - see  pp.  26-30. 

5.  Perform  step  two  of  the  Hsu-Hunter  method  - see  pp.  30-32. 

6.  Determine  significance  probabilities  for  each  of  the  two 
above  steps. 

7.  Repeat  steps  3 through  6 for  all  response  variables  and 
all  simulated  periods.  Note  that  there  are  essentially 
two  tests  for  each  output  response  variable  and  each  si- 
mulated period;  these  correspond  to  step  one  and  step  two 
of  the  Hus-Hunter  method.  The  results  of  all  of  these 
test-pairs  should  be  summarized  in  tabular  form  as  on 

page  37.  In  this  table,  the  different  comparison  variables 
should  be  clearly  identified  and  labeled,  and  so  should  the 
different  simulated  periods.  This  will  facilitate  later 
decision  making;  in  this  way  the  fact  that  some  comparison 
variables  are  more  important  than  others,  and  some  time 
periods  are  more  reliable  than  others,  can  be  taken  into 
account  in  making  judgements  about  the  collective  outcomes 
of  the  test  as  described  on  pp.  35-37. 


8.  Apply  standard  t-test  of  the  equality  of  means  of  two  random 
samples  in  place  of  or  in  addition  to  Hsu-Hunter  compari- 
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sons,  particularly  if  difficulties  are  encountered  in 
fitting  an  autoregressive  model. 

9.  The  foregoing  is  only  one  measure  of  the  model's  ability  to 
simulate  airfield  and  airspace  operations.  The  results  of 
these  statistical  comparisons  will  have  to  be  weighed  along 
with  other  evidence,  e.g.,  graphical  and  tabular  compari- 
sons, results  sensitivity  analyses,  and  the  evaluation  of 
model  logic,  in  making  the  final  judgement  as  to  the  ade- 
quacy of  the  model  for  its  intended  applications. 

C.  Priority  Ranking  of  Comparison  Variables: 

1.  The  variables  of  Table  2 may  be  priority  ranked  according  to 
two  main  criteria: 

(a)  their  importance  as  figures  of  merit  of  the  airfield, 
and 

(b)  how  accurately  it  is  felt,  a priori,  the  model  should 
estimate  the  variables  given  the  nature  of  the  input 
data  for  validation. 

2.  The  following  priority  ranking  is  selected: 

(a) ,  arrival  threshold  flow  rates 

(b)  departure  threshold  (roll)  flow  rates 

(c)  arrival  airspace  delay 

(d)  taxi-in  time 

(e)  taxi -out  time 

(f)  departure  queue  size 

(g)  penalty-box  delay 

3.  It  Is  further  suggested  that  the  foregoing  variables  be  ob- 
tained for  5-minute  intervals  from  both  the  model  output 
and  the  observed  data.  This  will  enable  us  to  use  the 
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summary  output  format  from  the  model,  but  for  each  5-min- 
utes instead  of  each  hour  which  is  the  usual  model  output. 

4.  The  5-minute  interval  data  should  be  punched  directly  on 
cards  by  both  the  model  and  the  data-reduction  programs. 

This  will  entail  inserting  an  additional  punch  statement 
and  corresponding  format  statement  in  the  model  and  data 
reduction  programs. 

D.  Analyzing  Differences  Between  Model  Estimates  and  Observed  Data: 

1.  An  important  model  application  is  to  investigate  differences 
among  alternative  improvements  and  runway-use  configurations. 
It  is,  therefore,  of  interest  in  the  validation  to  test  the 
model's  ability  to  estimate  such  differences.  This  will 
be  attempted  by  testing  differences  in  two  time  series  as 
estimated  by  the  model  versus  the  corresponding  differences 
in  two  time  series  from  the  observed  data.  In  each  case, 
a new  time  series,  say  Z^,  will  be  obtained  as  the  difference 
between  two  time  series,  i.e.. 


where  the  superscripts,  (1)  and  (2),  refer  to  two  different 
configurations  or  two  alternative  improvements.  We  will 
explore  with  D.  A.  Hsu  the  problems  associated  with  analy- 
zing such  a time  series. 
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